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1. Introduction 

Historically, the Karhunen-Loeve arose as a tool from the interface of probability 
theory and information theory; see details with references inside the paper. It has 
served as a powerful tool in a variety of applications; starting with the problem 
of separating variables in stochastic processes, say X t ; processes that arise from 
statistical noise, for example from fractional Brownian motion. Since the initial in- 
ception in mathematical statistics, the operator algebraic contents of the arguments 
have crystallized as follows: starting from the process X t , for simplicity assume zero 
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mean, i.e., E(Xt) — 0; create a correlation matrix C(s,t) = E(X s Xi). (Strictly 
speaking, it is not a matrix, but rather an integral kernel. Nonetheless, the matrix 
terminology has stuck.) The next key analytic step in the Karhunen-Loeve method 
is to then apply the Spectral Theorem from operator theory to a corresponding 
selfadjoint operator, or to some operator naturally associated with C: Hence the 
name, the Karhunen-Loeve Decomposition (KLC). In favorable cases (discrete spec- 
trum), an orthogonal family of functions (/„(£)) in the time variable arise, and a 
corresponding family of eigenvalues. We take them to be normalized in a suitably 
chosen square-norm. By integrating the basis functions f n (t) against X t , we get a 
sequence of random variables Y n . It was the insight of Karhunen-Loeve [25] to give 
general conditions for when this sequence of random variables is independent, and 
to show that if the initial random process X t is Gaussian, then so are the random 
variables Y n . (See also Example 3.1 below.) 

In the 1940s, Kari Karhunen (|20j. |2Tj ) pioneered the use of spectral theoretic 
methods in the analysis of time series, and more generally in stochastic processes. 
It was followed up by papers and books by Michel Loeve in the 1950s [25], and in 
1965 by R.B. Ash [3]. (Note that this theory precedes the surge in the interest in 
wavelet bases!) 

As we outline below, all the settings place rather stronger assumptions. We 
argue how more modern applications dictate more general theorems, which we 
prove in our paper. A modern tool from operator theory and signal processing 
which we will use is the notion of frames in Hilbert space. More precisely, frames 
are redundant "bases" in Hilbert space. They are called framed, but intuitively 
should be thought of as generalized bases. The reason for this, as we show, is 
that they offer an explicit choice of a (non-orthogonal) expansion of vectors in the 
Hilbert space under consideration. 

In our paper, we rely on the classical literature (see e.g., [3]), and we accomplish 
three things: (i) We extend the original Karhunen-Loeve idea to case of continuous 
spectrum; (ii) we give frame theoretic uses of the Karhunen-Loeve idea which arise 
in various wavelet contexts and which go beyond the initial uses of Karhunen-Loeve; 
and finally (iii) to give applications. 

These applications in our case come from image analysis; specifically from the 
problem of statistical recognition and detection; e.g., to nonlinear variance, for 
example due to illumination effects. Then the Karhunen-Loeve Decomposition 
(KLD), also known as Principal Component Analysis (PCA) applies to the intensity 
images. This is traditional in statistical signal detection and in estimation theory. 
Adaptations to compression and recognition are of a more recent vintage. In brief 
outline, each intensity image is converted into vector form. (This is the simplest 
case of a purely intensity-based coding of the image, and it is not necessarily ideal 
for the application of KL-decompositions.) 

The ensemble of vectors used in a particular conversion of images is assumed to 
have a multi-variate Gaussian distribution since human faces form a dense cluster 
in image space. The PCA method generates small set of basis vectors forming sub- 
spaces whose linear combination offer better (or perhaps ideal) approximation to 
the original vectors in the ensemble. In facial recognition, the new bases are said to 
span intra-face and inter-face variations, permitting Euclidean distance measure- 
ments to exclusively pick up changes in for example identity and expression. 
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Our presentation will start with various operator theoretic tools, including frame 
representations in Hilbert space. We have included more details and more expla- 
nations than is customary in more narrowly focused papers, as we wish to cover 
the union of four overlapping fields of specialization, operator theory, information 
theory, wavelets, and physics applications. 

While entropy encoding is popular in engineering, [28] , [33] , [10] the choices made 
in signal processing are often more by trial and error than by theory. Reviewing 
the literature, we found that the mathematical foundation of the current use of 
entropy in encoding deserves closer attention. 

In this paper we take advantage of the fact that Hilbert space and operator 
theory form the common language of both quantum mechanics and of signal/image 
processing. Recall first that in quantum mechanics, (pure) states as mathematical 
entities "are" one-dimensional subspaces in complex Hilbert space 7i, so we may 
represent them by vectors of norm one. Observables "are" selfadjoint operators in 

H, and the measurement problem entails von Neumann's spectral theorem applied 
to the operators. 

In signal processing, time-series, or matrices of pixel numbers may similarly be 
realized by vectors in Hilbert space TL. The probability distribution of quantum 
mechanical observables (state space H.) may be represented by choices of orthonor- 
mal bases (ONBs) in Tt in the usual way (see e.g., P35]). In signal/image processing, 
because of aliasing, it is practical to generalize the notion of ONB, and this takes 
the form of what is called "a system of frame vectors" ; see [7] . 

But even von Neumann's measurement problem, viewing experimental data as 
part of a bigger environment (see e.g., [13], [36], [15]) leads to basis notions more 
general than ONBs. They are commonly known as Positive Operator Valued Mea- 
sures (POVMs), and in the present paper we examine the common ground between 
the two seemingly different uses of operator theory in the separate applications. To 
make the paper presentable to two audiences, we have included a few more details 
than is customary in pure math papers. 

We show that parallel problems in quantum mechanics and in signal processing 
entail the choice of "good" orthonormal bases (ONBs). One particular such ONB 
goes under the name "the Karhunen-Loeve basis." We will show that it is optimal 
in three ways, and we will outline a number of applications. 

The problem addressed in this paper is motivated by consideration of the optimal 
choices of bases for certain analogue-to-digital (A-to-D) problems we encountered 
in the use of wavelet bases in image-processing (see [IB], [35], [53], [51]); but certain 
of our considerations have an operator theoretic flavor which we wish to isolate, as 
it seems to be of independent interest. 

There are several reasons why we take this approach. Firstly our Hilbert space 
results seem to be of general interest outside the particular applied context where 
we encountered them. And secondly, we feel that our more abstract results might 
inspire workers in operator theory and approximation theory. 

I. 1. Digital Image Compression. In digital image compression, after the quan- 
tization (see Figure II. ip entropy encoding is performed on a particular image for 
more efficient-less storage memory-storage. When an image is to be stored we need 
either 8 bits or 16 bits to store a pixel. With efficient entropy encoding, we can use 
a smaller number of bits to represent a pixel in an image, resulting in less mem- 
ory used to store or even transmit an image. Thus, the Karhunen-Loeve theorem 
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enables us to pick the best basis thus to minimize the entropy and error, to better 
represent an image for optimal storage or transmission. Here, optimal means it uses 
least memory space to represent the data, i.e., instead of using 16 bits, it uses 11 
bits. So, the best basis found would allow us to better represent the digital image 
with less storage memory. 
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Figure 1. Outline of the wavelet image compression process. [28] 



In our next section we give the general context and definitions from operators 
in Hilbert space which we shall need: We discuss the particular orthonomal bases 
(ONBs) and frames which we use, and we recall the operator theoretic context of 
the Karhunen-Loeve theorem [3] . In approximation problems involving a stochastic 
component (for example noise removal in time-series or data resulting from image 
processing) one typically ends up with correlation kernels; in some cases as frame 
kernels; see details in section |4] In some cases they arise from systems of vectors 
in Hilbert space which form frames (see Definition 14. 2jl . In some cases parts of the 
frame vectors fuse (fusion frames) onto closed subspaces, and we will be working 
with the corresponding family of (orthogonal) projections. Either way, we arrive 
at a family of selfadjoint positive semidcfinite operators in Hilbert space. The 
particular Hilbert space depends on the application at hand. While the Spectral 
Theorem does allow us to diagonalize these operators, the direct application the 
Spectral Theorem may lead to continuous spectrum which is not directly useful in 
computations, or it may not be computable by recursive algorithms. As a result 
we introduce in Section [6] a weighting of the operator to be analyzed. 

The questions we address are optimality of approximation in a variety of ONBs, 
and the choice of the "best" ONB. Here "best" is given two precise meanings: 
(1) In the computation of a sequence of approximations to the frame vectors, the 
error terms must be smallest possible; and similarly (2) we wish to minimize the 
corresponding sequence of entropy numbers (referring to von Neumann's entropy). 
In two theorems we make precise an operator theoretic Karhunen-Loeve basis, which 
we show is optimal both in regards to criteria (1) and (2). But before we prove our 
theorems, we give the two problems an operator theoretic formulation; and in fact 
our theorems are stated in this operator theoretic context. 

In section [5J we introduce the weighting, and we address a third optimality 
criteria; that of optimal weights: Among all the choices of weights (taking the form 
of certain discrete probability distributions) turning the initially given operator into 
trace-class, the problem is then to select the particular weights which are optimal 
in a sense which we define precisely. 
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2. General Background 



2.1. From Data to Hilbert Space. In computing probabilities and entropy 
Hilbert space serves as a helpful tool. As an example take a unit vector / in 
some fixed Hilbert space Tl, and an orthonormal basis (ONB) tpi with i running 
over an index set /. With this we now introduce two families of probability mea- 
sures, one family Pf(-) indexed by / £ Ti., and a second family Pt indexed by a 
class of operators T : H — > H. 

2.1.1. The measures Pf. Define 



where A C /, and where (-|-) denotes the inner product. Following physics conven- 
tion, we make our inner product linear in the second variable. That will also let us 
make use of Dirac's convenient notation for rank-one operators, see eq (|2.5p below. 

Note then that Pf (A) is a probability measure on the finite subsets A of I. To 
begin with, we make the restriction to finite subsets. This is merely for later use in 
recursive systems, see e.g., eq (|2.2[) . In diverse contexts, extensions from finite to 
infinite is then done by means of Kolmogorov's consistency principle [23j . 

By introducing a weighting we show that this assigment also works for more 
general vector configurations C than ONBs. Vectors in C may represent signals or 
image fragments/blocks. Correlations would then be measured as inner products 
(u\v) with u and v representing different image pixels. Or in the case of signals, u 
and v might represent different frequency subbands. 

2.1.2. The measures Pt- A second more general family of probability measures 
arising in the context of Hilbert space is called determinantal measures. Specifically, 
consider bitstreams as points in an infinite Cartesian product O = IIieN{0jl}- 
Cylinder sets in O are indexed by finite subsets icN, 

Ca = {(W1,W2, -)K = 1 for i e A ) 

If T is an operator in l 2 (N) such that < (u\Tu) < \\u\\ 2 for all u £ I 2 , then set 



where T(i, j) is the matrix representation of T computed in some ONB in I 2 . Using 
general principles [23l [19] it can be checked that Pt{Ca) is independent of the 
choice of ONB. 

To verify that Pt{~) extends to a probability measure defined on the sigma- 
algebra generated by Cas, see e.g. Q35], Ch7. The argument is based on Kol- 
mogorov's consistency principle, see [23] 

Frames (Definition 14. 3|) are popular in analyzing signals and images. This fact 
raises questions of comparing two approximations: one using a frame the other 
using an ONB. However, there are several possible choices of ONBs. An especially 
natural choice of an ONB would be one diagonalizes the matrix ((fi\fj)) where (fi) 
is a frame. We call such a choice of ONB Kahunen-Loeve (K-L) expansion. Section 
[3] deals with a continuous version of this matrix problem. The justification for why 
diagonalization occurs and works also when the frame (fi) is infinite that is based 
on the Spectral Theorem. For the details regarding this, see the proof of Theorem 
13.31 below. 



(2.1) 




(2.2) 



PT(C A )=det(T(i,j))i, je A 
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In symbols, we designate the K-L ONB associated to the frame (fi) as (</>,). 
In computations, we must rely on finite sums, and we are interested in estimat- 
ing the errors when different approximations are used, and where summations are 
truncated. Our main results make precise how the K-L ONB yields better approx- 
imations, smaller entropy and better synthesis. Even more we show that infimum 
calculations yield minimum numbers attained at the K-L ONB expansions. We em- 
phasize that the ONB depends on the frame chosen, and this point will be discussed 
in detail later. 

If larger systems are subdivided the smaller parts may be represented by projec- 
tions Pi, and the i — j correlations by the operators PiPj. The entire family (Pi) 
is to be treated as a fusion frame [6j . Fusion frames are defined in Definition 
l4.12l below. Frames themselves are generalized bases with redundancy, for example 
occurring in signal processing involving multiplexing. The fusion frames allow de- 
compositions with closed subspaces as opposed to individual vectors. They allow 
decompositions of signal/image processing tasks with degrees of homogeneity. 

2.2. Definitions. 

Definition 2.1. Let 7i be a Hilbert space. Let (tpi) and (4>i) be orthonormal bases 
(ONB), with index set I. Usually 



If (ipi)i£i is an ONB, we set Q n := the orthogonal projection onto span{i/ji, ip n }- 

We now introduce a few facts about operators which will be needed in the paper. 
In particular we recall Dirac's terminology [11] for rank-one operators in Hilbert 
space. While there are alternative notation available, Dirac's bra-ket terminology 
is especially efficient for our present considerations. 

Definition 2.2. Let vectors u, v € 7i. Then 

(2.4) ( u \ v ) — inner product S C, 

(2.5) \u) (v\ = rank-one operator, H — > H, 
where the operator \u)(v\ acts as follows 

(2.6) \u)(v\w = \u)(v\w) = (v\w)u, for all w 6 H. 

Dirac's bra-ket and ket-bra notation is is popular in physics, and it is especially 
convenient in working with rank-one operators and inner products. For example, 
in the middle term in eq (|2.6|) . the vector u is multiplied by a scalar, the inner 
product; and the inner product comes about by just merging the two vectors. 

2.3. Facts. The following formulas reveal the simple rules for the algebra of rank- 
one operators, their composition, and their adjoints. 



(2.3) 



/ = N = {1,2,...}. 



(2.7) 



\Ul)(Vi\\u 2 )(V2\ = (Vl\u2)\ui)(v 2 \ 



and 



(2.8) \u){v\* = \v){u\. 
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In particular, formula (|2.7|) shows that the product of two rank-one operators is 
again rank-one. The inner product (vi\u2) is a measure of a correlation between 
the two operators on the LHS of (|2.7[) . 

If S and T are bounded operators in 7i, in B{7i), then 

(2.9) S\u)(v\T= \Su)(T*v\ 
If (V'i)ieN is an ONB then the projection 

Q n := projspan{ipi, ...,ip n } 

is given by 

n 

(2.10) Qn = ^|^)(^l; 

1=1 

and for each i, l^i}^) is the projection onto the one-dimensional subspace Cipi C 

n. 

3. The Kahunen-Loeve transform 

In general, one refers to a Karhunen-Loeve transform as an expansion in Hilbert 
space with respect to an ONB resulting from an application of the Spectral-Theorem. 

Example 3.1. Suppose X t is a stochastic process indexed by t in a finite interval J, 
and taking values in L 2 (fl, P) for some probability space (fl, P). Assume the nor- 
malization E(X t ) — 0. Suppose the integral kernel E(X t X s ) can be diagonalized, 
i.e., suppose that 

J E{X t X s )(j> k {s)ds = A fc fo(t) 
with an ONB (0 fc ) in L 2 {J). If E{X t ) = then 

X t (ui) = \/\k~<f)k(t)Z k (Ld), wefi 

where E(ZjZ k ) = 6 jjk , and £7(Z fe ) = 0. The ONB (<p k ) is called the KL-basis with 
respect to the stochastic processes {X t : t E I}. 

The KL-theorem [3] states that if (X t ) is Gaussian, then so are the random 
variables (Z k ). Furthermore, they are N(0, 1) i.e., normal with mean zero and 
variance one, so independent and identically distributed. This last fact explains 
the familiar optimality of KL in transform coding. 

Remark 3.2. Consider the case when 

E(X t Xs) = \(t 2H + s 2H -\t-sf H ) 

and H £ (0, 1) is fixed. If J = R in the above application of KL to stochastic 
processes then it is possible by a fractional integration to make the _L 2 (M)-ONB 
consist of wavelets, i.e., 

ip 3 , k (t) := 2 j/2 ip(2 j t-k), j, k £ Z, i.e. double-indexed, t € R, for some ip £ L 2 (R) 

see e.g. [19]. The process X t is called H— fractional Brownian motion, as outlined 
in e.g. PI] p.57. 

The following theorem makes clear the connection to Hilbert space geometry as 
used in present paper: 
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Theorem 3.3. Let (0, P) by a probability space, J C K an interval (possibly 
infinite), and let (Xt)teJ be a stochastic process with values in L 2 (f2,P). Assume 
E(X t ) = for all t G J. Then L 2 (J) splits as an orthogonal sum 

(3.1) L 2 (J)=H d (BH c 

(d is for discrete and c is for continuous ) such that the following data exists: 

(a) (4>k)keN an ONB in Hd- 

(b) (Zk)ken ■ independent random variables. 

(c) E(Z 3 Z k ) = S 3 . k , and E(Z k ) = 0. 

(d) (A fe ) CJR> . 

(e) </>(■,■) -' a Borel measure on R in the first variable, such that 

(i) 4>(E, •) € Ti. c for E an open subinterval of J. 
and 

(ii) {(f){E 1 ,-)\<i){E 2 ,-)) L 2 { j ) =0 whenever E 1 (1 E 2 =0. 

(f) Z{-,-) : a measurable family of random variables such that Z(E±,-) and 
Z{Ei, •) are independent when E\,E 2 £ Bj and E\C\ E 2 =$, 

E(Z(X, -)Z(X', ■)) = S(X ~ A'), and E(Z{\, •)) = 0. 

Finally, we get the following Karhunen-Loeve expansions for the L 2 (J) -operator 
with integral kernel E(X t X s ): 

(3.2) 5> fe |0 fe )(<fel+ f \\cj>(dX,-))(^(dX,-)\ 

fcGN •' J 

Moreover, the process decomposes thus: 

(3.3) X t (u) = V>^Z k (uj)fa{t) + j VXZ{X,uj)cj)(dX,t). 

k£N ° J 

Proof. By assumption the integral operator in L 2 (J) with kernel E(X t X s ) is self- 
adjoint, positive semidefinite, but possibly unbounded. By the Spectral Theorem, 
this operator has the following representation. 

XQ(dX) 

where Q(-) is a projection valued measure defined on the Borel subsets B, of M>o- 
Recall 

Q(Si n S 2 ) = Q(Si)Q{S 2 ) for Si, S 2 E B: 

and J °° Q(dX) is the identity operator in L 2 ( J). The two closed subspaces Ttd and 
7i c in the decomposition (|3.ip are the discrete and continuous parts of the projection 
value measure Q, i.e., Q is discrete (or atomic) on Tl ( i, and it is continuous on Ti. c . 
Consider first 

Qd(-)=Q(-)\n d 

and let (X k ) be the atoms. Then for each k, the non-zero projection Q({X k }) is a 
sum of rank one projections |0/s)(^>fc| corresponding to a choice of ONB in the Afc 
subspace. (Usually the multiplicity is one, in which case Q{{X k }) = \(j)k)(4>k\-) This 
accounts for the first terms in the representations (|3.2p and (|3.3p . 

We now turn to the continuous part, i.e., the subspace TL C , and the continuous 
projection valued measure 

Qc(-) = Q(-)\n c . 
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The second terms in the two formulas ()3.2p and (|3.3p result from an application of 
a disintegration theorem from [12], Theorem 3.4. This theorem is applied to the 
measure Q c (-)- 

We remark for clarity that the term \<fi(d\, •)) (</>(dA, -)| under the integral sign in 
(|3.2p is merely a measurable field of projections P(dX). □ 

Our adaptation of the spectral theorem from books in operator theory (e.g., 
[19j ) is made with view to the application at hand, and our version of Theorem l3.3l 
serves to make the adaptation to how operator theory is used for time series, and 
for encoding. We have included it here because it isn't written precisely this way 
elsewhere. 



The word 'frame' in the title refers to a family of vectors in Hilbert space with 
basis-like properties which are made precise in Definition 14.21 We will be using 
entropy and information as defined classically by Shannon [30j . and extended to 
operators by von Neumann [18] . 

The reference [3] offers a good overview of the basics of both. Shannon's pio- 
neering idea was to quantify digital "information," essentially as the negative of 
entropy, entropy being a measure of "disorder." This idea has found a variety of 
application n both signal/image processing, and in quantum information theory, 
see e.g., [24]. A further recent use of entropy is in digital encoding of signals and 
images, compressing and quantizing digital information into a finite floating-point 
computer register. (Here we use the word "quantizing" [38], [3§], [33] in the sense of 
computer science.) To compress data for storage, an encoding is used which takes 
into consideration probability of occurrences of the components to be quantized; 
and hence entropy is a gauge for the encoding. 

Definition 4.1. T E B(7i) is said to be trace class if and only if ^(^i|T^) is 
absolutely convergent for some ONB (tpi)- In this case, set 



Definition 4.2. A sequence (h a ) a( zA in W is called a frame if there are constants 
< Ci < c 2 < oo such that 



4. Frame Bounds and Subspaces 



n 



(4.1) 




(4.2) 




Definition 4.3. Suppose we are given a frame operator 



(4.3) 




and an ONB (ipi). Then for each n, the numbers 



n 



(4.4) 




are called the error-terms. 
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Set L : H -> I 2 , 

(4.5) L : f ~ ((h a \f)) aeA . 
Lemma 4.4. If L is as in |^.5[ ) then L* : I 2 — > 7i is given oj/ 

(4.6) i*((c tt )) - 51 

where (c a ) £ Z 2 ; and 

(4.7) L*L=^|/i a )<ft a | 

agA 

Lemma 4.5. If (f a ) are the normalized vectors resulting from a frame (h a ), i.e. 
h a — \\h a f a \\, and w a :— \\h a \\ 2 , then L*L has the form 



Proof. The desired conclusion follows from the Dirac formulas (|2.7l) - (|2.8p . Indeed 

\K){h a \ = \\\h a \\f a ){\\h a \\f a \ = \\h a f\f a ){f a \= W a P a 

where P a satisfies the two rules P a = P* = P 2 . □ 

Definition 4.6. Suppose we are given (f a ) a <£A, a frame, non-negative numbers 
{w a }a£A, where A is an index set, with ||/ Q || = 1, for all a S A. 

(4.8) G:=X>a|/a)(/a| 
is called a frame operator associated to (f a )- 

Lemma 4.7. Note that G is trace class if and only if^2 a w a < oo; and then 

(4.9) trG =J2w a 

a£A 

Proof. The identity (|4.9| follows from the fact that all the rank-one operators \u)(v\ 
are trace class, with 

tr\u)(v\ = ^2(<ipi\u)(v\ipi) = (u\v). 
i=i 

In particular, tr\u)(u\ = \\u\\ 2 . □ 
We shall consider more general frame operators 

(4.10) G = w « p o< 

a£A 

where (P a ) is an indexed family of projections in Tt, ie., P a = P* = P 2 , for all 
a 6 A. Note that P a is trace class if and only if it is finite-dimensional, ie., if and 
only if the subspace P a H — {x € H\P a x = x} is finite-dimensional. 

When (ipi) is given set Q n := Yh=i l^iKV'il an d Qn=I- Qn where / is the 
identity operator in TL. 

Lemma 4.8. 

(4.H) Et=tr{GQ^). 
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Proof. The proof follows from the previous facts, using that 

\\f a -Qnf a \\ 2 = \\fa\\ 2 -\\Qnfaf 

for all a G A and n £ N. The expression 14.41 for the error term is motivated as 
follows. The vector components f a in Definition 14.81 are indexed by a £ A and are 
assigned weights w a . But rather than computing ^ Q w a as in Lemma 14.71 we wish 
to replace the vectors f a with finite approximations Q n fa, then the error term l4~4l 
measures how well the approximation fits the data. 

Lemma 4.9. tr(GQ n ) = Y^aeA w a\\Qnfa\\ 2 ■ 
Proof. 

n 

tr(GQ n ) = J2(^\GQ n ^) = Y,^i\GQ n ^i) = J2(^\ G ^i) 

i i i—1 

n 

= E™«E iw«>i 2 = E w */X 

aeA i=l aeA 

as claimed. □ 
Proof of Lemma \4-.8\ continued. The relative error is represented by the difference: 

w a - w a \\Q n f a \\ 2 = ^2 ^aii/aii 2 - E w <*\\Q™f°<\\ 2 

aeA aeA aeA aeA 

= E 'MIIA*H 2 - WQnfaW 2 ) = E W «H/« " Q™/«l| 2 
aeA aeA 

J2^\\Q^fa\\ 2 = tr(GQ^). 



aeA 



□ 



Definition 4.10. If G is a more general frame operator (|4. 10)1 and (ipi) is some 
ONB, we shall set E% := tr(GQ^); this is called the error sequence. 

The more general case of (|4. 10[) where 

(4.12) rankP a = trP a > 1 

corresponds to what are called subspace frames, i.e., indexed families (P a ) of or- 
thogonal projections such that there are < c\ < C2 < oo and weights w a > such 
that 

(4.13) Cl |i/n 2 < E^ii^/n 2 <c 2 ii/ir 

aeA 

for all / £ U. 

We now make these notions precise: 

Definition 4.11. A projection in a Hubert space Tt is an operator P in TL sat- 
isfying P* = P = P 2 . It is understood that our projections P are orthogonal, 
i.e., that P is a selfadjoint idempotent. The orthogonality is essential because, 
by von Neumann, we know that there is then a 1-1 correspondence between closed 
subspaces in 7i and (orthogonal) projections: every closed subspace in 7i is the 
range of a unique projection. 

We shall need the following generalization of the notion Definition 14.21 of frame. 
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Definition 4.12. A fusion frame (or subspace frame) in a Hilbert space H is an 
indexed system (Pi , Wi) where each Pi is a projection, and where (wi) is a system 
of numerical weights, i.e., Wi > 0, such that (|4.13p holds: specifically, the system 
(Pi , Wi) is called a fusion frame when (|4.13p holds. 

It is clear (see also section 6) that the notion of "fusion frame" contains conven- 
tional frames Definition 14.21 as a special case. 

The property (|4.13|) for a given system controls the weighted overlaps of the 
variety of subspaces Hi(:= Pi(TL)) making up the system, i.e., the intersections of 
subspaces corresponding to different values of the index. Typically the pair wise 
intersections are non-zero. The case of zero pair wise intersections happens precisely 
when the projections are orthogonal, i.e., when PiPj = for all pairs with i and j 
different. In frequency analysis, this might represent orthogonal frequency bands. 

When vectors in Ti represent signals, we think of bands of signals being "fused" 
into the individual subspaces TLi. Further, note that for a given system of subspaces, 
or equivalently, projections, there may be many choices of weights consistent with 
(|4. 13|) : The overlaps may be controlled, or weighted, in a variety of ways. The 
choice of weights depends on the particular application at hand. 

Theorem 4.13. The Karhunen-Loeve ONB with respect to the frame operator L* L 
gives the smallest error terms in the approximation to a frame operator. 

Proof. Given the operator G which is trace class and positive semidefinite, we may 
apply the spectral theorem to it. What results is a discrete spectrum, with the nat- 
ural order Ai > A2 > ... and a corresponding ONB (<f>k) consisting of eigenvectors, 
i.e., 

(4.14) G0n=A^,fceN 

called the Karhunen-Loeve data. The spectral data may be constructed recursively 
starting with 

(4.15) Xx= sup (4>\G4>) = {<h\Gfa) 

<Pen, !|0||=i 

and 

(4.16) \ k+1 = sup (4>\G4>) = (<f> k +i\G0 k+1 ) 

ten, ||0||=i 

Now an application of [2|; Theorem 4.1 yields 

n n 

(4.17) X)A fc >tr(Q*GO = X;(Vfc|G^fc) for all n, 

k=l k=l 

where Qn i s the sequence of projections from (|2.10p . deriving from some ONB (ipi) 
and arranged such that 

(4.18) (V'llGtyi) > (i>2\G^ 2 ) > ■ 

Hence we are comparing ordered sequences of eigenvalues with sequences of diagonal 
matrix entries. 
Finally, we have 

oo oo 

trG = 5^A* = 5^ fc |Gtyfc)<oo. 
fc=i k=i 
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The assertion in Theorem 14.131 is the validity of 

(4.19) E£ < Et 

for all (ipi) E ONB(H), and all n = 1,2,...; and moreover, that the infimum on 
the RHS in (|4. 19|) is attained for the KL-ONB (<j) k ). But in view of our lemma 
for E% , Lemma 14.81 we see that ()4.19j) is equivalent to the system ()4. 17|) in the 
Arveson-Kadison theorem. □ 

The Arveson-Kadison theorem is the assertion (I4.17[) for trace class operators, see 
e.g., refs pQ and [2J. That (|4.19[) is equivalent to (|4.17p follows from the definitions. 

Remark 4.14. Even when the operator G is not trace class, there is still a con- 
clusion about the relative error estimates. With two (fa) and (ipi) £ ONB(Tt) and 
with n < m, m large, we may introduce the following relative error terms: 

Km = HG{Qt - Qt)) 

and 

Et, m = tr(G(Qt-Qt)). 

If m is fixed, we then choose a Karhunen-Loeve basis (fa) for Q^GQ^ and the 
following error inquality holds: 

p4>,KL < rpip 
n,m — n,m' 

Our next theorem gives Karhunen-Loeve optimality for sequences of entropy 
numbers. 

Theorem 4.15. The Karhunen-Loeve ONB gives the smallest sequence of entropy 
numbers in the approximation to a frame operator. 

Proof. We begin by a few facts about entropy of trace-class operators G. The 
entropy is defined as 

(4.20) S(G) := -tr(G log G). 

The formula will be used on cut-down versions of an initial operator G. In some 
cases only the cut-down might be trace-class. Since the Spectral Theorem applies 
to G, the RHS in (j4~20|) is also 

OC 

(4.21) S(G) = -^A fc logA fc . 

fe=i 

For simplicity we normalize such that 1 = trG = ^fc; ancl we introduce the 

partial sums 



(4.22) ^ L (G):=-^A fc logA fc . 

fc=i 

and 

n 

(4.23) St(G) := - £>*|G^ fc > log (i> k \G^ k ). 



k=l 



Let (tpi) E ONB(H), and set df := (tp k \Gip k )) then the inequalities ([iT?) take 
the form 



(4.24) tr(QtG) = V df < VA„ n=l,2, 



/ j % — / j ' 
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where as usual an ordering 

(4.25) 4>4> 
has been chosen. 

Now the function (3(t) :— t\ogt is convex. And application of Remark 6.3 in [2] 
then yields 

n n 

(4.26) £/J(df n=1 ^- ■ 

i=l »=1 

Since the RHS in fl~2"6]) is -ir(GlogG) = -S* L (G), the desired inequalities 

(4.27) S* L (G)<St{G), n=l,2,... 

follow, i.e., the KL-data minimizes the sequence of entropy numbers. □ 

4.0.1. Supplement. Let G, TL be as before H, is an oo— dimensional Hilbert Space 
G = E Q P «> w " > 0, P Q = P* = P 2 . Suppose dimH Xl (G) > where dimH Xl (G) = 
{0 G H\Gq) = 0} and Ax := sup{{f\Gf), f G W, ||/|| = 1}, then define A 2 ,A 3 ,... 
recursively 

A fe+1 ^^{(/IG/)!/!.^,^,...,^} 

where dimTL\ k (G) > 0. Set /C = VfcLi span{i/>i, 02, 0fc}. Set p := in/{A/c|fc = 
1,2, ...} then we can apply theorems 14.131 and 14.151 to the restriction (G — pl)\tc- 
i.e., the operator K —> K. given by K, 3 x i — ► Gx — px E K. 

Actually there are two cases for G as for G/c G — pi: (1) compact, (2) 
trace-class. We did (2), but we now discuss (1): 

When G or G/c is given, we want to consider 

(p = inf{{f\Gf)\\\f\\ = l} 
\ Ax = sup{(f\Gf)\ ||/|| = 1} 

If G = E«w„Pa then (f\Gf) = E« ^ll p a/|| 2 - If G = Ec>«)<M where 
(/la.) C 7i is a family of vectors, then 

aiG/) = ]ri(/i Q |/)i 2 . 

a 

The frame bound condition takes the form 

ci||/|| 2 <^^||P a /|| 2 <c 2 ||/|| 2 
or in the standard frame case {h a } a£ A G FRAME(TL) 

(/|G/)=^|(/i Q |/)| 2 <c 2 ||/|| 2 . 

a 

Lemma 4.16. If a frame system (fusion frame or standard frame) has frame bounds 
< ci < C2 < oo, t/ien i/ie spectrum of the operator G is contained in the closed 
interval [ci, C2] = {x G ffi|ci < x < C2}. 

Proof. It is clear from the formula for G that G — G* . Hence the spectrum theorem 
applies, and the result follows. In fact, if z G C \ [ci, C2] then 

IN/-G/H >dist(z,[ Cl ,c 2 ])||/|| 

so zl — G is invertible. So z is in the resolvent set. Hence spec{G) C [ci, c 2 ]. □ 
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A frame is a system of vectors which satisfies the two a priory estimates (|4.2[) . 
so by the Dirac notation, it may be thought of as a statement about rank-one pro- 
jections. The notion of fusion frame is the same, only with finite-rank projections; 
see (14T3)) . 

5. Splitting off rank-one operators 

The general principle in frame analysis is to make a recursion which introduces 
rank-one operators, see Definition 12.21 and Facts section 12.31 The theorem we 
will proves accomplishes that for general class of operators in infinite dimensional 
Hilbert space. Our result may also be viewed as extension of Perron-Frobenius's 
theorem for positive matrices. Since we do not introduce positivity in the present 
section, our theorem will instead include assumptions which restrict the spectrum 
of the operators to which our result applies. 

One way rank-one operators enter into frame analysis is through equation (14. 7|) . 
Under the assumptions in Lcmma l4.5l the operator L* L is invertible. If we multiply 
equation (|4.7j) on the left and on the right with (L*i) _1 we arrive at the following 
two representations for the identity operator 

(5.1) J = £|fc«><fca|=X;|fca><fca| 

a a 

where h a = {L*L)~ l h a . 

Truncation of the sums in (|5.1j) yields non-selfadjoint operators which are used in 
approximation of data with frames. Starting with a general non-selfadjoint operator 
T, our next theorem gives a general method for splitting off a rank-one operator 
from T. 

Theorem 5.1. Let Ti be a generally infinite dimensional Hilbert space, and let T 
be a bounded operator in Ti. Under the following three assumptions we can split off 
a rank-one operator from T. Specifically assume a £ C satisfies: 

(1) 7^ a 6 spec(T) where spec(T) denotes the spectrum ofT. 

(2) dimR(a~T) ± = 1 where R(a—T) denotes the range of the operator al —T , 
and _L denotes the orthogonal complement. 

(3) lim n ^ oc a- n T n x = for all x S R(a — T). 

Then it follows that the limit exists everywhere inTi in the strong topology ofB(Ti). 
Moreover, we may pick the following representation 

(5.2) lim n ^ oc a- n T n = \§(w 1 \ 
for the limiting operator on Ti, where 

(5.3) ||wi||=l, T*wx=awi, (£|i0i) = 1, 
in fact 

(5.4) £ — w± G R(a — T), and T£ = <, 

where the over-bar denotes closure. 

Theorem 15.11 is an analogue of the Perron- Frobenius theorem (e.g., [H]). Dic- 
tated by our applications, the present Theorem 1 5. H is adapted to a different context 
where the present assumptions are different than those in the Perron- Frobenius the- 
orem. We have not seen it stated in the literature in this version, and the proof (and 
conclusions) are different from that of the standard Perron-Frobenius theorem. 
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Remark 5.2. For the reader's benefit we include the following statement of the 
Perron- Frobenius theorem in a formulation which makes it clear how Theorem 15.11 
extends this theorem. 

5.0.2. Perron- Frobenius: Let d < oo be given and let T be a d x d matrix with 
entries j > 0, and with positive spectral radius a. Then there is a column-vector 
w with Wi > 0, and a row- vector £ such that the following conditions hold: 

Tw = aw, £T = a£, and £w = 1. 

Proof, (of Theorem 15. ip Note that by the general operator theory, we have the 
following formulas: 

R(a - T) x = N(T* -a) = {ye H\T*y = ay}. 

By assumption (2) this is a one-dimensional space, and we may pick w\ such that 
T*wi = aw\, and ||wi|| = 1. This means that 

{Cwi}- 1 = R(a - T) xx = R(a - T) 

is invariant under T. 

As a result, there is a second bounded operator G which maps the space R(a — T) 
into itself, and restricts T, i.e. T\ R ,_ T ^ = G. Further, there is a vector n 1 - G 

(Cwi) such that T has the following matrix representation 

I a | 00 • • • \ Cwi 

\V ± I G ) Oi) 1 - ' 
The entry a in the top left matrix corner represents the following operator, swi <— > 
aswi . The vector n± is fixed, and Tw\ = aw\ + n 1 - . The entry n 1 - in the bottom 
left matrix corner represents the operator swi i— > 577^, or |77^)(m;i|. 

In more detail: If Qi and Q] 1 = I — Qi denote the respective projections onto 
Cwi and w x , then 

Q\TQi = aQi, 

0^rg 1 = |7 ? ± )( Wl |, 

Q1TQ1 = 0, and 
QtTQi = G. 

Using now assumptions in theorem (1) and (2), we can conclude that the operator 
a — G is invertible with bounded inverse 

(a-G)- 1 : ( Wi ) x -> (wi^. 

We now turn to powers of operator T. An induction yields the following matrix 
representation: 

/ a n I 00-- 

T n = j 

\(a"-G n )(a-G)- 1 ?7- L | G" 

Finally an application of assumption (3) yields the following operator limit 
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We used that -q 1 - e R(a - T), and that 
a~ n {a n ~G n ){a-G)-\^ = {l-a- n G n ){a-G)- 1 i 1 1 - — » (a-GrV- 

n — >oc 

Further, if we set £ := Wi + (a — G) -1 ?/^, then 

T£ = awi +ri ± + G(a - G)" V = awi + a{a - G)~ V = a£. 
Finally, note that 

(a - G)~ V S i?(a - T) = K) 1 . 

It is now immediate from this that all of the statements in the conclusion of the 
theorem including (|5.2|) . (|5.3p and (|5.4|) are satisfied for the two vectors Wi and 

e. □ 

6. Weighted Frames and Weighted Frame-Operators 

In this section we address that when frames are considered in infinite-dimensional 
separable Hilbert space, then the trace-class condition may not hold. 

There are several remedies to this, one is the introduction of a certain weighting 
into the analysis. Our weighting is done as follows in the simplest case: Let (/i„) n£ n 
be a sequence of vectors in some fixed Hilbert space, and suppose the frame condi- 
tion from Definition 14.21 is satisfied for all / € 7i We say that (h n ) is a frame. As 
in section 21 we introduce the analysis operator L: 

U3f^ {{K\f))n e l 2 

and the two operators 

(6.1) G:=L*L:H^H 
and 

(6.2) G R := LL* : I 2 -» I 2 , 

(the Grammian). 
As noted, 

(6.3) G=Y,\K){h n \ 

nGN 

and Gr is matrix-multiplication in I 2 by the matrix ((hi\hj)), i.e., 
I 2 3 x = (xi) i * (G R x) = y = (yi) 

where 

(6.4) Vj = J2{hi\hi)xi 

i 

Proposition 6.1. Let {h n } be a set of vectors in a Hilbert space (infinite-dimensional, 
separable), and suppose these vectors form a frame with frame-bounds C\, Ci. 

(a) Let (y n ) be a fixed sequence of scalars in I 2 . Then the frame operator 
G = G v formed from the weighted sequence {v n h n } is trace-class. 

(b) If y^„_i \v n \ 2 = 1; then the upper frame bound for {v n h n } is also C2- 

(c) Pick a finite subset F of the index set, typically the natural numbers N, 
and then pick (v n ) in I 2 such that v n — 1 for all n in F. Then on this set 
F the weighted frame agrees with the initial system of frame vectors {h n }, 
and the weighted frame operator G v is not changed on F. 
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Proof, (a) Starting with the initial frame {h n } n eN we form the weighted system 
{v n h n }- The weighted frame operator arises from applying (|6.3|) to this modified 
system, i.e., 

(6.5) G v = 2J \v n h n )(v n h n \ = 2J |«n| 2 |^n)(^n| 

riGN nGN 

Let (e n ) be the canonical ONB in Z 2 , i.e., (e n )fc := 6 ni k- Then h n = L*e n , so 

IIM < IlillM = 11^*11 = Nl- 

Now apply the trace to (|6.5j) : Suppose ||(v e )||f2 = 1. Then 

trG v = K\ 2 tr\h n )(h n \ = l^n| 2 ||/ln|| 2 < ||i|| 2 ||(«n)||p 

nSN nGN 

= ||L|| 2 = ||LL*|| = ||L*£|| - ||G|| = sup(spec(G)). 

(Note that the estimate shows more: The sum of the eigenvalues of G v is dominated 
by the top eigenvalue of G.) But we recall (|4.0.1j) that (h n ) is a frame with frame- 
bounds ci, C2. It follows ([4.0. 1|) that spec(G) C [ci,C2\. This holds also if c\ is 
the largest lower bound in l|4.2[) . and C2 the smallest upper bound; i.e., the optimal 
frame bounds. 

Hence c 2 is the spectral radius of G, and also C2 = \\G\\. The conclusion in 
(a)-(b) follows. 

(c) The conclusion in (c) is a immediate consequence, but now 

trG v = M 2 HM 2 < ||G||^|«„| 2 = C2 (#^+ X! I w «| 2 ) 

where #F is the cardinality of the set specified in (c). □ 

Remark 6.2. Let {h n } and (v n ) £ I 2 be as in the proposition and let D v be the 
diagonal operator with the sequence (v n ) down the diagonal. Then G v = L*\D V \ 2 L, 
and Gr v = D*GrD v ; where 



I D v I 2 =D V D V 



6.1. B(H) = T{H)*. The formula B(H) = T(H)* summarizes the known fact [22] 
that B(H.) is a Banach dual of the Banach space of all trace-class operators. 

The conditions (|4. 13|) and (|4.2|) which introduce frames, (both in vector form 
and fusion form) may be recast with the use of this duality. 

Proposition 6.3. An operator G arising from a vector system (h a ) C TL, or from 
a projection system (w a ,P a ), yields a frame with frame bounds c\ and if and 
only if 

(6.6) citr(p) < tr(pG) < c 2 tr(p) 

for all positive trace-class operators p on TL. 
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Proof. Since both (I4.13j) and ()4.2|) may be stated in the form 

C i||/|| 2 <(/|G/)< C2 ||/|| 2 

and 

H\f)(f\) = ll/f, 

it is clear that (|6.6|) is sufficient. 

To see it is necessary, suppose (|4.f 3|) holds, and that p is a positive trace operator. 
By the spectral theorem, there is a an ONB (fi), and > such that 

p = I>l/i>(/i|- 

i 

We now use the estimates 

a < (fi\Gfi)<c 2 

in 

tr( P G) = J2Wi\Gfi). 

i 

Since = the conclusion (|6.6|1 follows. □ 

Remark 6.4. Since quantum mechanical states (see [55]) take the form of density 
matrices, the proposition makes a connection between frame theory and quantum 
states. Recall, a density matrix is an operator p € T{TL) + with tr{p) = 1. 

7. Localization 

Starting with a frame (/i n )ngs, non-zero vectors index set N for simplicity; see 
()4.2|) . we introduce the operators 

(7.1) G:=Y,\hn){K\ 

neN 

(7.2) G v :=Y,K\ 2 \h n )(h n \ fort, eZ 2 ; 
and the components 

(7.3) G hn := \h n ){h n \ 

We further note that the individual operators Gh n in (|7.3|) are included in the 
/ 2 — index family G v of (|7.2|) . To sec this, take 

(7.4) v = e n = (0,0, ...,0,1,0, ...) where 1 is in n" 1 place. 

It is immediate that the spectrum of Gh n is the singleton ||/in|| 2 , and we may 
take ||/in|| _1 ft.n as a normalized eigenvector. Hence for the components Gh n , there 
are global entropy considerations. Still in applications, it is the sequence of local 
approximations 

m 

(7.5) Y i ^l>i\h n )^l>i = QtK 

i=l 

which is accessible. It is computed relative to some ONB(ipi). The corresponding 
sequence of entropy numbers is: 

m 

(7.6) St(h n ) := - \(A\h n )\ 2 log m\h n )\ 2 
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The next result shows that for every v £ I 2 with \\v\\p = 1, the combined 
operator G v always is entropy-improving in the following precise sense. 

Proposition 7.1. Consider the operators G v and Gh n introduced in \7. 6 2\ ) and 
(T7T3p. Suppose v El satisfies \\v\\i2 = 1. Then for every ONB (ipi) and for every 
m, 

(7-7) S*{G V ) > J2 \vn\ 2 St(G hn ). 

Proof. Let v, tp, and m be as specified in the proposition. Introduce the convex 
function 0(t) := tlogt, t e [0, 1] with the convention that /3(0) = /3(1) = 0. Then 

rn rn 

-st{G v )=Y,P^\ G ^)\ 2 ) ^ EE \vn\ 2 p{m\G h M\ 2 ) 

2=1 i=l n£N 

m 

= E w'E^i^i^^)! 2 ) =-Ew J fc) 

n£N i=l nSN 

where we used that (3 is convex. In the last step, formula (|7.6p was used. This 
proves (|7.7[) in the proposition. □ 



8. Engineering Applications 



In wavelet image compression, wavelet decomposition is performed on a digi- 
tal image. Here, an image is treated as a matrix of functions where the entries 
are pixels. The following is an example of a representation for a digitized image 
function: 



(8.1) 



f(x,y) 



/(0,0) 
/(i,o) 



/(o,i) 
/(M) 



/(0,7V-1) 
f(l,N-l) 



\/(M-l,0) /(M-1,1) 



f(M-l,N-l)J 



After the decomposition quantization is performed on the image, the quantization 
may be a lossy (meaning some information is being lost) or lossless. Then a lossless 
means of compression, entropy encoding, is done on the image to minimize the 
memory space for storage or transmission. Here the mechanism of entropy will be 
discussed. 



8.1. Entropy Encoding. In most images their neighboring pixels are correlated 
and thus contain redundant information. Our task is to find less correlated represen- 
tation of the image, then perform redundancy reduction and irrelevancy reduction. 
Redundancy reduction removes duplication from the signal source (for instance a 
digital image). Irrelevancy reduction omits parts of the signal that will not be 
noticed by the Human Visual System (HVS). 

Entropy encoding further compresses the quantized values in a lossless manner 
which gives better compression in overall. It uses a model to accurately determine 
the probabilities for each quantized value and produces an appropriate code based 
on these probabilities so that the resultant output code stream will be smaller than 
the input stream. 
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8.1.1. Some Terminology. 

(i) Spatial Redundancy refers to correlation between neighboring pixel values. 

(ii) Spectral Redundancy refers to correlation between different color planes 
or spectral bands. 

8.2. The Algorithm. Our aim is to reduce the number of bits needed to represent 
an image by removing redundancies as much as possible. 

The algorithm for entropy encoding using Karhunen-Loeve expansion can be 
described as follows: 

1. Perform the wavelet transform for the whole image, (ie. wavelet decom- 
position.) 

2. Do quantization to all coefficients in the image matrix, except the average 
detail. 

3. Subtract the mean: Subtract the mean from each of the data dimensions. 
This produces a data set whose mean is zero. 

4. Compute the covariance matrix. 

n 

5. Compute the eigenvectors and eigenvalues of the covariance matrix. 

6. Choose components and form a feature vector(matrix of vectors), 

(eigi, ...,eig n ) 

Eigenvectors are listed in decreasing order of the magnitude of their eigen- 
values. Eigenvalues found in step 5 are different in values. The eigenvector 
with highest eigenvalue is the principle component of the data set. 

7. Derive the new data set. 

Final Data = Row Feature Matrix x Row Data Adjust. 

Row Feature Matrix is the matrix that has the eigenvectors in its rows with the 
most significant eigenvector (i.e., with the greatest eigenvalue) at the top row of 
the matrix. Row Data Adjust is the matrix with mean-adjusted data transposed. 
That is, the matrix contains the data items in each column with each row having 
a separate dimension. [29] 

Inside the paper we use (<pi) and (ipi) to denote generic ONBs for a Hilbert space. 
However, in wavelet theory, [5] it is traditional to reserve <fi for the father function 
and tjj for the mother function. A 1-level wavelet transform of an TV x M image 
can be represented as 

/a 1 | V\ 
(8.2) f-> — 

W I d 1 / 

where the subimages h 1 , d 1 , a 1 and v 1 each have the dimension of N/2 by AT/2. 



a 1 = V* ® F„! : ^(x, y) = <t>{x)<j>{y) = E, Ej Kh^{2x - i)cj>{2y - j) 
h 1 = V r \ <g> Wl : ^ H (x, y) = <P(x)4>(y) = ^ £\ 9l h^{2x - z)0(2y - j) 
v 1 = W x m ® Vl : V y (x, y) = 4>{x)i3{y) = Ei h igj <t>{2x - i)cj){2y - j) 
d 1 = Wi ® Wl : i; D (x, y) = 1>{x)1>{y) = E, E, 9i9j^x - i)<P(2y - j) 
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where <f> is the father function and ip is the mother function in sense of wavelet, V 
space denotes the average space and the W spaces are the difference space from 
multiresolution analysis (MRA) [9J. Note that, on the very right hand side, in 
each of the four system of equations, we have the affinely transformed function <p 
occurring both places under each of the the double summations. Reason: Each 
of the two functions, father function <p, and mother function ip satisfies a scaling 
relation. So both <p and ip are expressed in terms of the scaled family of functions 
(f>(2 ■ —j) with j ranging over Z, and the numbers hi are used in the formula for </>, 
while the numbers gj are used for ip. Specifically, the relations are: 

(j)(x) = hi<f)(2x — i), and ip(x) = gj<p(2x — j). 

i 3 

hi and gj are low-pass and high-pass filter coefficients respectively, a 1 denotes 
the first averaged image, which consists of average intensity values of the original 
image. Note that only <p function, V space and h coefficients are used here, h 1 
denotes the first detail image of horizontal components, which consists of intensity 
difference along the vertical axis of the original image. Note that <p function is used 
on y and ip function on x, W space for x values and V space for y values; and 
both h and g coefficients are used accordingly, v 1 denotes the first detail image 
of vertical components, which consists of intensity difference along the horizontal 
axis of the original image. Note that <p function is used on x and ip function on y, 
W space for y values and V space for x values; and both h and g coefficients are 
used accordingly, d 1 denotes the first detail image of diagonal components, which 
consists of intensity difference along the diagonal axis of the original image. The 
original image is reconstructed from the decomposed image by taking the sum of 
the averaged image and the detail images and scaling by a scaling factor, ft could 
be noted that only ip function, W space and g coefficients are used here. See [34], 
[32]. 

This decomposition is not only limited to one step, but it can be done again 
and again on the averaged detail depending on the size of the image. Once it 
stops at certain level, quantization (see [59] j [2E]j [33]) is done on the image. This 
quantization step may be lossy or lossless. Then the lossless entropy encoding is 
done on the decomposed and quantized image. 

There are various means of quantization and one commonly used one is called 
thresholding. Thresholding is a method of data reduction where it puts for the 
pixel values below the thresholding value or something other 'appropriate' value. 
Soft thresholding is defined as follows: 

'0 if |acj < A 

(8.4) T aoft (x) = { x- A ifx>A 



and hard thresholding as follows: 
(8.5) T hard (x) 



x + A if x < — A 

'0 if|x|<A 
x if \x\ > A 



where A € R+ and x is a pixel value, ft could be observed by looking at the 
definitions, the difference between them is related to how the coefficients larger 
than a threshold value A in absolute values are handled. In hard thresholding, 
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these coefficient values are left alone. Where else, in soft thresholding, the coefficient 
values area decreased by A if positive and increased by A if negative [35 . Also, see 

m, m, m : 

Starting with a matrix representation for a particular image, we then compute 
the covariance matrix using the steps from (3) and (4) in algorithm above. Next, we 
compute the Karhunen-Loeve eigenvalues. As usual, we arrange the eigenvalues in 
decreasing order. The corresponding eigenvectors are arranged to match the eigen- 
values with multiplicity. The eigenvalues mention here are the same eigenvalues in 
Theorem 14.131 and Theorem 14. 151 thus yielding smallest error and smallest entropy 
in the computation. 

The Karhunen-Loeve transform or Principal Components Analysis (PCA) allows 
us to better represent each pixels on the image matrix with the smallest number 
of bits. It enables us to assign the smallest number of bits for the pixel that has 
the highest probability, then the next number to the pixel value that has second 
highest probability, and so forth; thus the pixel that has smallest probability gets 
assigned the highest value among all the other pixel values. 

An example with letters in the text would better depict how the mechanism 
works. Suppose we have a text with letters a, e, f, q, r with the following probability 
distribution: 



Letter 


Probability 


a 


0.3 


e 


0.2 


f 


0.2 


q 


0.2 


r 


0.1 



Shannon-Fano entropy encoding algorithm is outlined as follows: 

• List all letters, with their probabilities in decreasing order of their proba- 
bilities. 

• Divide the list into two parts with approximately equal probability (i.e., 
the total of probabilities of each part sums up to approximately 0.5). 

• For the letters in the first part start the code with a bit and for those in 
the second part with a 1. 

• Recursively continue until each subdivision is left with just one letter [3]. 
Then applying the Shannon-Fano entropy encoding scheme on the above table 

gives us the following assignment. 



Letter 


Probability 


code 


a 


0.3 


00 


e 


0.2 


01 


f 


0.2 


100 


q 


0.2 


101 


r 


0.1 


110 



Note that instead of using 8-bits to represent a letter, 2 or 3-bits are being used to 
represent the letters in this case. 

8.3. Benefits of Entropy Encoding. One might think that the quantization 
step suffices for compression. It is true that the quantization does compress the 
data tremendously. After the quantization step many of the pixel values are either 
eliminated or replaced with other suitable values. However, those pixel values are 
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still represented with either 8 or 16 bits. See 11.11 So we aim to minimize the 
number of bits used by means of entropy encoding. Karhunen-Loeve transform or 
PCAs makes it possible to represent each pixel on the digital image with the least 
bit representation according to their probability thus yields the lossless optimized 
representation using least amount of memory. 
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